Skip to content

Comments

Github Action for Buildbot Builders#5

Closed
kszucs wants to merge 52 commits intomasterfrom
actions-test
Closed

Github Action for Buildbot Builders#5
kszucs wants to merge 52 commits intomasterfrom
actions-test

Conversation

@kszucs
Copy link
Owner

@kszucs kszucs commented Sep 28, 2019

No description provided.

jorisvandenbossche and others added 30 commits September 24, 2019 08:43
Follow-up on apache#5462 to also apply this fix for ChunkedArray.

Closes apache#5471 from jorisvandenbossche/ARROW-6652-chunked-array-timezone and squashes the following commits:

89d0044 <Joris Van den Bossche> add helper function
5122451 <Joris Van den Bossche> ARROW-6652:  Fix ChunkedArray.to_pandas to retain timezone

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
Closes apache#5439 from pitrou/ARROW-3777-slow-fs and squashes the following commits:

2ca64c5 <Antoine Pitrou> Try to fix Windows failure
b02a8c5 <Antoine Pitrou> ARROW-3777:  Add Slow input streams and slow filesystem

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…cal plan

This PR adds the binary expression to the new physical execution plan, with support for comparison operators (`<`, `<=`, `>`, `>=`, `==`, `!=`) and boolean operators `AND` and `OR`.

Other binary expressions, such as math expressions will be added in a future PR.

Closes apache#5478 from andygrove/ARROW-6669 and squashes the following commits:

83bfa77 <Andy Grove> formatting
af8d298 <Andy Grove> address PR feedback
9ad3b7f <Andy Grove> formatting
bb82a24 <Andy Grove> use expect() instead of unwrap() when downcasting arrays
9b94cc8 <Andy Grove> Implement binary expression with support for comparison and boolean operators

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Paddy Horan <paddyhoran@hotmail.com>
… to Parquet

https://issues.apache.org/jira/browse/ARROW-6187

Closes apache#5436 from jorisvandenbossche/ARROW-6187-parquet-extension-type and squashes the following commits:

e56164b <Joris Van den Bossche> expose constants in extension_type.h
61d245e <Joris Van den Bossche> clean-up chunked array creation
6b2f190 <Joris Van den Bossche> recreate extension type on read
bdda0f7 <Joris Van den Bossche> test that extension metadata is already saved
abf2a2f <Joris Van den Bossche> add python test
fb4b810 <Joris Van den Bossche> ARROW-6187:  Fallback to storage type when writing ExtensionType to Parquet

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Also adds and improves other C++ doc items.

Closes apache#5487 from pitrou/ARROW-6629-cpp-fs-docs and squashes the following commits:

d47a008 <Antoine Pitrou> Update docs/source/cpp/io.rst
895f04a <Antoine Pitrou> Try to fix Sphinx error on Travis
c40e0e2 <Antoine Pitrou> ARROW-6629:   Add filesystem docs

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…atch

See the readme and the new tests for example output.

This patch also fixes a validation bug in `dictionary()`, aliases that to `DictionaryType$create`, and adds default arguments.

Closes apache#5492 from nealrichardson/print-methods and squashes the following commits:

c092d30 <Neal Richardson> Merge branch 'print-methods' of github.com:nealrichardson/arrow into print-methods
02afb89 <Neal Richardson> Prettier printing of dictionary type's ordered attribute
5750100 <Neal Richardson> indices in the docs too
6be0328 <Neal Richardson> indices
2d4e744 <Neal Richardson> Add/improve print methods for Array, ChunkedArray, Table, RecordBatch

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
https://issues.apache.org/jira/browse/ARROW-6674

Closes apache#5489 from jorisvandenbossche/ARROW-6674-test-warnings and squashes the following commits:

2a2bb14 <Joris Van den Bossche> ARROW-6674:  Fix or ignore the test warnings

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
…of StructArray

https://issues.apache.org/jira/browse/ARROW-6158

Closes apache#5488 from jorisvandenbossche/ARROW-6158-struct-array-validation and squashes the following commits:

7573781 <Joris Van den Bossche> ARROW-6158:  Validate child array types with type fields of StructArray

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
…t be base64-encoded to be UTF-8 compliant

I have added a simple base64 implementation (Zlib license) to arrow/vendored from

https://github.com/ReneNyffenegger/cpp-base64

Closes apache#5493 from wesm/ARROW-6678 and squashes the following commits:

c058e86 <Wes McKinney> Simplify, add MSVC exports
06f75cd <Wes McKinney> Fix Python unit test that needs to base64-decode now
eabb121 <Wes McKinney> Fix LICENSE.txt, add iwyu export
b3a584a <Wes McKinney> Add vendored base64 C++ implementation and ensure that Thrift KeyValue in Parquet metadata is UTF-8

Authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
…n" operator

This PR implements the physical execution plan for the selection operator (the WHERE clause in a SQL query).

In order to have working tests, I also had to implement some subset of expressions (column reference, literal value, comparison expressions, and CAST). However, the goal of this PR is not to add complete support for all expressions but to implement the Selection operator. I will create separate JIRA/PRs for adding support for other expressions and data types in the physical query plan.

Closes apache#5320 from andygrove/ARROW-6089 and squashes the following commits:

6cad327 <Andy Grove> Implement selection operator

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
…quet

_build_nested_path has a reference cycle because the closured function refers to the parent cell which also refers to the closured function again. Address this by clearing the reference to the function from the parent cell before returning.

open_dataset_file is partialed with self inside the ParquetFile class. Prevent this by using a weakref instead.

Closes apache#5476 from AaronOpfer/master and squashes the following commits:

f4909e0 <Wes McKinney> Fix flakes
883ab86 <Aaron Opfer> ARROW-6667:  remove cyclical object references in pyarrow.parquet

Lead-authored-by: Aaron Opfer <aaron.opfer@chicagotrading.com>
Co-authored-by: Wes McKinney <wesm+git@apache.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
- Adds StatusDetail to the docs
- Fixes the Doxygen config to work with `ARROW_FLIGHT_EXPORT`
- Adds error codes to the format docs (though they're not part of the formal format)
- Touches up some docstrings and adds missing classes to the API docs
- Add a basic description of how to set up a Flight server/client in C++

Closes apache#5491 from lihalite/flight-docs and squashes the following commits:

2948983 <David Li> ARROW-6677:  Document Flight in C++

Authored-by: David Li <li.davidm96@gmail.com>
Signed-off-by: Antoine Pitrou <antoine@python.org>
…taframe

- add scanReverse() to dataFrame and filteredDataframe
- add tests for scanReverse()

Closes apache#5480 from mmaclach/master and squashes the following commits:

01faae8 <mmaclach> JS: scanReverse

Authored-by: mmaclach <mmaclachlan@ccri.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324).

Closes apache#5445 from nealrichardson/win-fs-fix and squashes the following commits:

515d710 <Neal Richardson> Rename functions
1775a6d <Neal Richardson> Fix two other absolute paths I missed
0dce0d7 <Neal Richardson> Munge paths directly on windows
f03815e <Neal Richardson> Normalize paths for filesystem API on Windows

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
…ata sources

I discovered this last minute while running manual tests. I have been able to run parallel queries against parquet files using this branch as a dependency.

Closes apache#5494 from andygrove/ARROW-6086 and squashes the following commits:

77bee15 <Andy Grove> Replace unwrap with Result combinator
cd11b97 <Andy Grove> don't panic
c751753 <Andy Grove> Add support for partitioned parquet data sources
25eaf45 <Andy Grove> Move build_file_list into common module

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
cf. https://github.com/jeroen/autobrew/blob/gh-pages/LICENCE.txt

Closes apache#5501 from nealrichardson/autobrew-license and squashes the following commits:

3e790f5 <Neal Richardson> MIT license for autobrew

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Wes McKinney <wesm+git@apache.org>
Closes apache#5500 from pitrou/ARROW-6630-file-format-docs and squashes the following commits:

aa5c57d <Antoine Pitrou> ARROW-6630:  Document C++ file formats

Authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Related to [ARROW-6472](https://issues.apache.org/jira/browse/ARROW-6472).

If we use visitor API this way:
>RangeEqualsVisitor visitor = new RangeEqualsVisitor(vector1, vector2);
vector3.accept(visitor, range)

if vector1/vector2 are say, StructVector}}s and vector3 is an {{IntVector - things can go bad. we'll use the compareBaseFixedWidthVectors() and do wrong type-casts for vector1/vector2.

Discussions see:
apache#5195 (comment)
https://issues.apache.org/jira/browse/ARROW-6472

Closes apache#5483 from tianchen92/ARROW-6472 and squashes the following commits:

3d3d295 <tianchen> add test
12e4aa2 <tianchen> ARROW-6472:  ValueVector#accept may has potential cast exception

Authored-by: tianchen <niki.lj@alibaba-inc.com>
Signed-off-by: Pindikura Ravindra <ravindra@dremio.com>
Initial support for array reader. List and map support will come later.

Closes apache#5378 from liurenjie1024/arrow-4218 and squashes the following commits:

433abab <Renjie Liu> Remove unwraps with result
6407dee <Renjie Liu> Fix format
4dd9a01 <Renjie Liu> Initial support for array reader
215f73b <Renjie Liu> struct array reader
2e898ff <Renjie Liu> test done for primitive array reader

Authored-by: Renjie Liu <liurenjie2008@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Closes apache#5503 from andygrove/ARROW-6687 and squashes the following commits:

c0ded56 <Andy Grove> Bug fix in DataFusion Parquet reader

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
Closes apache#5499 from alippai/patch-1 and squashes the following commits:

d2fd03e <Ádám Lippai> Update README.md

Authored-by: Ádám Lippai <adam@rigo.sk>
Signed-off-by: Andy Grove <andygrove73@gmail.com>
…able

This is needed to use correct download URL for RC.

Closes apache#5506 from kou/packaging-linux-restore-arrow-version and squashes the following commits:

7face8a <Sutou Kouhei>  Restore ARROW_VERSION environment variable

Authored-by: Sutou Kouhei <kou@clear-code.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Related to [ARROW-6709](https://issues.apache.org/jira/browse/ARROW-6709).
Currently, in several consumers, currentIndex only increments when ResultSet was not null.
However, if ResultSet contains null values, the Arrow vector valueCount is not correct.

Closes apache#5511 from tianchen92/ARROW-6709 and squashes the following commits:

b1e9d5a <tianchen> ARROW-6709:  Jdbc adapter currentIndex should increment when value is null

Authored-by: tianchen <niki.lj@alibaba-inc.com>
Signed-off-by: Micah Kornfield <emkornfield@gmail.com>
Construct tree structure from std::vector<fs::FileStats>, following the path directory hierarchy.

Closes apache#5430 from fsaintjacques/ARROW-6606-path-tree and squashes the following commits:

43d19fa <François Saint-Jacques> Address comments
60b5945 <François Saint-Jacques> Simplify implementation
109ea85 <François Saint-Jacques> ARROW-6606:  Add PathTree tree structure

Authored-by: François Saint-Jacques <fsaintjacques@gmail.com>
Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>
…lity

https://issues.apache.org/jira/browse/ARROW-6683

@wesm is this more or less what you were thinking?

Closes apache#5498 from jorisvandenbossche/ARROW-6683-fastparquet-cross-testing and squashes the following commits:

4c1e3aa <Joris Van den Bossche> add comment
1d35f3b <Joris Van den Bossche> add fastparquet mark
c5d6161 <Joris Van den Bossche> ARROW-6683:  Test for fastparquet <-> pyarrow cross-compatibility

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
…ntegrition tests

Arrow Java Writer now requires an IpcOption for some APIs, this patch fixes the compilation to run Spark Integration tests.

Closes apache#5465 from BryanCutler/spark-integration-patch-ARROW-6429 and squashes the following commits:

918ab91 <Bryan Cutler> Remove redundant message
655937d <Bryan Cutler> Changes to Spark integration test config
dd2483f <Bryan Cutler> Add patch to rat excludes
48b2eac <Bryan Cutler> Adding patch to fix Spark compilation with IpcOption

Authored-by: Bryan Cutler <cutlerb@gmail.com>
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
Rust 1.40.0-nightly just got released and caused builds to start failing

Closes apache#5519 from andygrove/ARROW-6716 and squashes the following commits:

5301d7b <Andy Grove> trigger rebuild
b651d7b <Andy Grove> Use 1.40.0-nightly-2019-09-25

Authored-by: Andy Grove <andygrove73@gmail.com>
Signed-off-by: Paddy Horan <paddyhoran@hotmail.com>
…row specific)

This adds parameters to `write_parquet()` to control compression, whether to use dictionary, etc ... on top of the C++ classes `parquet::WriterProperties` and `parquet::ArrowWriterProperties` e.g.

```r
write_parquet(tab, file, compression = "gzip", compression_level = 7)
```

Closes apache#5451 from romainfrancois/ARROW-6532/write_parquet_compression and squashes the following commits:

413dd41 <Romain Francois> test make_valid_version()
50555f8 <Romain Francois> rename arguments to `x` and `sink`
9aff79b <Romain Francois> implement ==.Object that calls $Equals instead of implementing for each class.
ecd9218 <Romain Francois> rework documentation for write_parquet()
56dac33 <Romain Francois> Move read_parquet() and write_parquet() to top of the file
45ec63b <Romain François> Update r/R/parquet.R
66c51fd <Romain Francois> added all.equal.Object() that uses ==
c5549de <Romain Francois> Test ==.Table
5ade52d <Romain Francois> wrong length for use_dictionary and write_statistics
00cc214 <Romain Francois> abstract various ParquetWriterPropertiesBuilder$set_*() methods
1fdcc0b <Romain Francois> suggestsions from @nealrichardson
9bee8de <Romain Francois> define and use internal make_valid_version() function
004cf90 <Romain Francois> M%ake compression_from_name() vectorized
86d9ff4 <Romain Francois> Remove the _ from builder classes
6c4f003 <Romain Francois> add test helper so that we actually can test parquet roundtrip
d318a66 <Romain Francois> ==.Table
7f1c184 <Romain Francois> align arguments following tidyverse style guide
72caaab <Romain Francois> using assert_that()
738ea6e <Romain Francois> Remove $default() methods and use $create() wityh default arguments instead.
1166264 <Romain Francois> using make_valid_time_unit()
4055f67 <Romain Francois> More flexible arguments use_dictionary= and write_statistics=
2f2ae00 <Romain Francois> More flexible compression= and compression_level=
1e3b5b6 <Romain Francois> document()
2dd2cb9 <Romain Francois> + compression_level= in write_parquet()
b8337e1 <Romain Francois> lint
fa8990b <Romain Francois> Expose options from ParquetWriterProperties and ParquetArrowWriterProperties to write_parquet()
09ea0ad <Romain Francois> + ParquetWriterProperties$create() and associated ParquetWriterProperties_Builder class skeleton
1b84ad4 <Romain Francois> Exposing classes parquet::arrow::ArrowWriterProperties and parquet::arrow::WriterProperties to R side
0e09ac8 <Romain Francois> lint
aa34095 <Romain Francois> passing down the right stream
9ed32b6 <Romain Francois> Make write_parquet() generic, internal impl using streams rather than file path for more flexibility

Lead-authored-by: Romain Francois <romain@rstudio.com>
Co-authored-by: Romain François <romain@purrple.cat>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Closes apache#5514 from nealrichardson/fix-lint and squashes the following commits:

ea8b8e7 <Neal Richardson> Note that clang-format-7 is required
f83f0e1 <Neal Richardson> Incorporate @kou's suggestion
6bcc063 <Neal Richardson> Note how to use lint.sh and have it look for clang-format rather than hard-code its location

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
I noticed this now-defunct call to `table()` while reviewing another PR. We clearly weren't testing this case because if you were to pass a data.frame in, you'd get a segfault. This patch adds tests and fixes the issue.

Closes apache#5518 from nealrichardson/record-batch-writer-fix and squashes the following commits:

afad9fe <Neal Richardson> Fix untested RecordBatchWriter case

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
@kszucs kszucs changed the title Update main.yml Github Action for Buildbot Builders Sep 28, 2019
@kszucs kszucs closed this Sep 28, 2019
kszucs pushed a commit that referenced this pull request Feb 24, 2020
…comments.

The reset method allow the data structures to be re-used so they don't have to be allocated over and over again.

Closes apache#6430 from richardartoul/ra/merge-upstream and squashes the following commits:

5a08281 <Richard Artoul> Add license to test file
d76be05 <Richard Artoul> Add test for data reset
d102b1f <Richard Artoul> Add tests
d3e6e67 <Richard Artoul> cleanup comments
c8525ae <Richard Artoul> Add Reset method to int array (#5)
489ca25 <Richard Artoul> Fix array.setData() to retain before release (#4)
88cd05f <Richard Artoul> Add reset method to Data (#3)
6d1b277 <Richard Artoul> Add Reset() method to String array (#2)
dca2303 <Richard Artoul> Add Reset method to buffer and cleanup comments (#1)

Lead-authored-by: Richard Artoul <richard.artoul@datadoghq.com>
Co-authored-by: Richard Artoul <richardartoul@gmail.com>
Signed-off-by: Sebastien Binet <binet@cern.ch>
kszucs pushed a commit that referenced this pull request May 11, 2020
This PR enables tests for `ARROW_COMPUTE`, `ARROW_DATASET`, `ARROW_FILESYSTEM`, `ARROW_HDFS`, `ARROW_ORC`, and `ARROW_IPC` (default on). apache#7131 enabled a minimal set of tests as a starting point.

I confirmed that these tests pass locally with the current master. In the current TravisCI environment, we cannot see this result due to a lot of error messages in `arrow-utility-test`.

```
$ git log | head -1
commit ed5f534
% ctest
...
      Start  1: arrow-array-test
 1/51 Test  #1: arrow-array-test .....................   Passed    4.62 sec
      Start  2: arrow-buffer-test
 2/51 Test  #2: arrow-buffer-test ....................   Passed    0.14 sec
      Start  3: arrow-extension-type-test
 3/51 Test  #3: arrow-extension-type-test ............   Passed    0.12 sec
      Start  4: arrow-misc-test
 4/51 Test  #4: arrow-misc-test ......................   Passed    0.14 sec
      Start  5: arrow-public-api-test
 5/51 Test  #5: arrow-public-api-test ................   Passed    0.12 sec
      Start  6: arrow-scalar-test
 6/51 Test  #6: arrow-scalar-test ....................   Passed    0.13 sec
      Start  7: arrow-type-test
 7/51 Test  #7: arrow-type-test ......................   Passed    0.14 sec
      Start  8: arrow-table-test
 8/51 Test  #8: arrow-table-test .....................   Passed    0.13 sec
      Start  9: arrow-tensor-test
 9/51 Test  #9: arrow-tensor-test ....................   Passed    0.13 sec
      Start 10: arrow-sparse-tensor-test
10/51 Test #10: arrow-sparse-tensor-test .............   Passed    0.16 sec
      Start 11: arrow-stl-test
11/51 Test #11: arrow-stl-test .......................   Passed    0.12 sec
      Start 12: arrow-concatenate-test
12/51 Test #12: arrow-concatenate-test ...............   Passed    0.53 sec
      Start 13: arrow-diff-test
13/51 Test #13: arrow-diff-test ......................   Passed    1.45 sec
      Start 14: arrow-c-bridge-test
14/51 Test #14: arrow-c-bridge-test ..................   Passed    0.18 sec
      Start 15: arrow-io-buffered-test
15/51 Test #15: arrow-io-buffered-test ...............   Passed    0.20 sec
      Start 16: arrow-io-compressed-test
16/51 Test #16: arrow-io-compressed-test .............   Passed    3.48 sec
      Start 17: arrow-io-file-test
17/51 Test #17: arrow-io-file-test ...................   Passed    0.74 sec
      Start 18: arrow-io-hdfs-test
18/51 Test #18: arrow-io-hdfs-test ...................   Passed    0.12 sec
      Start 19: arrow-io-memory-test
19/51 Test #19: arrow-io-memory-test .................   Passed    2.77 sec
      Start 20: arrow-utility-test
20/51 Test apache#20: arrow-utility-test ...................***Failed    5.65 sec
      Start 21: arrow-threading-utility-test
21/51 Test apache#21: arrow-threading-utility-test .........   Passed    1.34 sec
      Start 22: arrow-compute-compute-test
22/51 Test apache#22: arrow-compute-compute-test ...........   Passed    0.13 sec
      Start 23: arrow-compute-boolean-test
23/51 Test apache#23: arrow-compute-boolean-test ...........   Passed    0.15 sec
      Start 24: arrow-compute-cast-test
24/51 Test apache#24: arrow-compute-cast-test ..............   Passed    0.22 sec
      Start 25: arrow-compute-hash-test
25/51 Test apache#25: arrow-compute-hash-test ..............   Passed    2.61 sec
      Start 26: arrow-compute-isin-test
26/51 Test apache#26: arrow-compute-isin-test ..............   Passed    0.81 sec
      Start 27: arrow-compute-match-test
27/51 Test apache#27: arrow-compute-match-test .............   Passed    0.40 sec
      Start 28: arrow-compute-sort-to-indices-test
28/51 Test apache#28: arrow-compute-sort-to-indices-test ...   Passed    3.33 sec
      Start 29: arrow-compute-nth-to-indices-test
29/51 Test apache#29: arrow-compute-nth-to-indices-test ....   Passed    1.51 sec
      Start 30: arrow-compute-util-internal-test
30/51 Test apache#30: arrow-compute-util-internal-test .....   Passed    0.13 sec
      Start 31: arrow-compute-add-test
31/51 Test apache#31: arrow-compute-add-test ...............   Passed    0.12 sec
      Start 32: arrow-compute-aggregate-test
32/51 Test apache#32: arrow-compute-aggregate-test .........   Passed   14.70 sec
      Start 33: arrow-compute-compare-test
33/51 Test apache#33: arrow-compute-compare-test ...........   Passed    7.96 sec
      Start 34: arrow-compute-take-test
34/51 Test apache#34: arrow-compute-take-test ..............   Passed    4.80 sec
      Start 35: arrow-compute-filter-test
35/51 Test apache#35: arrow-compute-filter-test ............   Passed    8.23 sec
      Start 36: arrow-dataset-dataset-test
36/51 Test apache#36: arrow-dataset-dataset-test ...........   Passed    0.25 sec
      Start 37: arrow-dataset-discovery-test
37/51 Test apache#37: arrow-dataset-discovery-test .........   Passed    0.13 sec
      Start 38: arrow-dataset-file-ipc-test
38/51 Test apache#38: arrow-dataset-file-ipc-test ..........   Passed    0.21 sec
      Start 39: arrow-dataset-file-test
39/51 Test apache#39: arrow-dataset-file-test ..............   Passed    0.12 sec
      Start 40: arrow-dataset-filter-test
40/51 Test apache#40: arrow-dataset-filter-test ............   Passed    0.16 sec
      Start 41: arrow-dataset-partition-test
41/51 Test apache#41: arrow-dataset-partition-test .........   Passed    0.13 sec
      Start 42: arrow-dataset-scanner-test
42/51 Test apache#42: arrow-dataset-scanner-test ...........   Passed    0.20 sec
      Start 43: arrow-filesystem-test
43/51 Test apache#43: arrow-filesystem-test ................   Passed    1.62 sec
      Start 44: arrow-hdfs-test
44/51 Test apache#44: arrow-hdfs-test ......................   Passed    0.13 sec
      Start 45: arrow-feather-test
45/51 Test apache#45: arrow-feather-test ...................   Passed    0.91 sec
      Start 46: arrow-ipc-read-write-test
46/51 Test apache#46: arrow-ipc-read-write-test ............   Passed    5.77 sec
      Start 47: arrow-ipc-json-simple-test
47/51 Test apache#47: arrow-ipc-json-simple-test ...........   Passed    0.16 sec
      Start 48: arrow-ipc-json-test
48/51 Test apache#48: arrow-ipc-json-test ..................   Passed    0.27 sec
      Start 49: arrow-json-integration-test
49/51 Test apache#49: arrow-json-integration-test ..........   Passed    0.13 sec
      Start 50: arrow-json-test
50/51 Test apache#50: arrow-json-test ......................   Passed    0.26 sec
      Start 51: arrow-orc-adapter-test
51/51 Test apache#51: arrow-orc-adapter-test ...............   Passed    1.92 sec

98% tests passed, 1 tests failed out of 51

Label Time Summary:
arrow-tests      =  27.38 sec (27 tests)
arrow_compute    =  45.11 sec (14 tests)
arrow_dataset    =   1.21 sec (7 tests)
arrow_ipc        =   6.20 sec (3 tests)
unittest         =  79.91 sec (51 tests)

Total Test time (real) =  79.99 sec

The following tests FAILED:
	 20 - arrow-utility-test (Failed)
Errors while running CTest
```

Closes apache#7142 from kiszk/ARROW-8754

Authored-by: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
kszucs pushed a commit that referenced this pull request Apr 7, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <weston.pace@gmail.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.